Tesselation and Clustering by Mixture Models and Their Parallel Implementations

نویسندگان

  • Qiang Du
  • Xiaoqiang Wang
چکیده

Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most important algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new clustering strategy which shares common features with both the EM and k-means algorithms. Our methods also lead to more general tessellations of a spatial region with respect to a continuous and possibly anisotropic density distribution. Moreover, we propose some probabilistic methods for the construction of these clusterings and tessellations corresponding to a continuous density distribution. Some numerical examples are presented to demonstrate the effectiveness of our new approach. In addition, we also discuss the parallel implementation and performance of our algorithms on some distributed memory systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tessellation and Clustering by Mixture Models and Their Parallel Implementations∗

Clustering and tessellations are basic tools in data mining. The k-means and EM algorithms are two of the most important algorithms in the Mixture Model-based clustering and tessellations. In this paper, we introduce a new clustering strategy which shares common features with both the EM and k-means algorithms. Our methods also lead to more general tessellations of a spatial region with respect...

متن کامل

Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models

Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an efficient algorithm for its implementation is presented. This algorithm uses the alternating expectationconditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this famil...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

Efficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields

This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...

متن کامل

Scalable Data Clustering using GPU Clusters

The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA’s CUDA framework and Tesla arch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004